NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

MAPoRL: Multi-agent post-co-training for collaborative large language models with reinforcement learning

Park, Chanwoo; Han, Seungju; Guo, Xingzhi; Ozdaglar, Asuman; Zhang, Kaiqing; Kim, Joo-Kyung (July 2025, Annual Meeting of the Association for Computational Linguistics (ACL))

Free, publicly-accessible full text available July 27, 2026
Do LLM agents have regret? A case study in online learning and games

Park, Chanwoo; Liu, Xiangyu; Ozdaglar, Asuman; Zhang, Kaiqing (April 2025, International Conference on Learning Representations (ICLR), 2025)

Free, publicly-accessible full text available April 24, 2026
Provable Partially Observable Reinforcement Learning with Privileged Information

Cai, Yang; Liu, Xiangyu; Oikonomou, Argyris; Zhang, Kaiqing (December 2024, Advances in Neural Information Processing Systems 37 (NeurIPS 2024))

Full Text Available
Two-Timescale Q-Learning with Function Approximation in Zero-Sum Stochastic Games

https://doi.org/10.1145/3670865.3673491

Chen, Zaiwei; Zhang, Kaiqing; Mazumdar, Eric; Ozdaglar, Asuman; Wierman, Adam (July 2024, ACM)

Full Text Available
A Finite-Sample Analysis of Payoff-Based Independent Learning in Zero-Sum Stochastic Games

Chen, Zaiwei; Zhang, Kaiqing; Mazumdar, Eric; Ozdaglar, Asuman; Wierman, Adam (September 2023, Advances in neural information processing systems)
Towards Understanding Asynchronous Advantage Actor-Critic: Convergence and Linear Speedup

https://doi.org/10.1109/TSP.2023.3268475

Shen, Han; Zhang, Kaiqing; Hong, Mingyi; Chen, Tianyi (May 2023, IEEE Transactions on Signal Processing)

Full Text Available
Toward a Theoretical Foundation of Policy Optimization for Learning Control Policies

https://doi.org/10.1146/annurev-control-042920-020021

Hu, Bin; Zhang, Kaiqing; Li, Na; Mesbahi, Mehran; Fazel, Maryam; Başar, Tamer (May 2023, Annual Review of Control, Robotics, and Autonomous Systems)

Gradient-based methods have been widely used for system design and optimization in diverse application domains. Recently, there has been a renewed interest in studying theoretical properties of these methods in the context of control and reinforcement learning. This article surveys some of the recent developments on policy optimization, a gradient-based iterative approach for feedback control synthesis that has been popularized by successes of reinforcement learning. We take an interdisciplinary perspective in our exposition that connects control theory, reinforcement learning, and large-scale optimization. We review a number of recently developed theoretical results on the optimization landscape, global convergence, and sample complexityof gradient-based methods for various continuous control problems, such as the linear quadratic regulator (LQR), [Formula: see text] control, risk-sensitive control, linear quadratic Gaussian (LQG) control, and output feedback synthesis. In conjunction with these optimization results, we also discuss how direct policy optimization handles stability and robustness concerns in learning-based control, two main desiderata in control engineering. We conclude the survey by pointing out several challenges and opportunities at the intersection of learning and control.
more » « less
Full Text Available
Can Direct Latent Model Learning Solve Linear Quadratic Gaussian Control?

Tian, Yi; Zhang, Kaiqing; Tedrake, Russ; Sra, Suvrit (January 2023, Conference on Learning for Dynamics and Control)

We study the task of learning state representations from potentially high-dimensional observations, with the goal of controlling an unknown partially observable system. We pursue a direct latent model learning approach, where a dynamic model in some latent state space is learned by predicting quantities directly related to planning (e.g., costs) without reconstructing the observations. In particular, we focus on an intuitive cost-driven state representation learning method for solving Linear Quadratic Gaussian (LQG) control, one of the most fundamental partially observable control problems. As our main results, we establish finite-sample guarantees of finding a near-optimal state representation function and a near-optimal controller using the directly learned latent model. To the best of our knowledge, despite various empirical successes, prior to this work it was unclear if such a cost-driven latent model learner enjoys finite-sample guarantees. Our work underscores the value of predicting multi-step costs, an idea that is key to our theory, and notably also an idea that is known to be empirically valuable for learning state representations.
more » « less
Full Text Available
Convergence and optimality of policy gradient primal-dual method for constrained Markov decision processes

https://doi.org/10.23919/ACC53348.2022.9867805

Ding, Dongsheng; Zhang, Kaiqing; Basar, Tamer; Jovanovic, Mihailo R. (June 2022, 2022 American Control Conference)

Full Text Available
Policy Optimization for $$\mathcal{H}_2$$ Linear Control with $$\mathcal{H}_\infty$$ Robustness Guarantee: Implicit Regularization and Global Convergence

https://doi.org/10.1137/20M1347942

Zhang, Kaiqing; Hu, Bin; Başar, Tamer (January 2021, SIAM Journal on Control and Optimization)

Full Text Available

« Prev Next »

Search for: All records